Ring-oriented Block Matrix Factorization Algorithms for Shared and Distributed Memory Architectures
نویسندگان
چکیده
Utilizing experiences from the implementations on shared memory multiprocessors (SMM) and distributed memory multicomputers (DMM), general ring-oriented routines are developed for the LU, Cholesky, and QR factorizations. Since, all machine dependencies are comprised to a small set of communication routines, the same factorization routines can be used on both the SMM and DMM architectures. The algorithms are described on high level with focus on the porta-bility aspects. Further, detailed implementations of the LU factor-ization and machine speciic communication routines for the Alliant FX2816, Intel iPSC/2, and IBM 3090VF/600J are enclosed. Timing results show that the performance of machine speciic implementations are preserved for the general ring-oriented block algorithms.
منابع مشابه
A Ring-Oriented Approach for Block Matrix Factorizations on Shared and Distributed Memory Architectures
A block (column) wrap-mapping approach for design of parallel block matrix factorization algorithms that are (trans)portable over and between shared memory multiprocessors (SMM) and distributed memory multicomputers (DMM) is presented. By reorganizing the matrix on the SMM architecture, the same ring-oriented algorithms can be used on both SMM and DMM systems with all machine dependencies compr...
متن کاملDesign and Performance Modeling of Parallel Block Matrix Factorizations for Distributed Memory Multicomputers
EEcient and scalable parallel block algorithms for the LU factorization with partial pivoting, the Cholesky, and QR factorizations in a distributed memory multicomputer environment are presented. The distributed system is viewed as a ring of processors and the algorithms correspond to shared memory algorithms parallelized on block level (explicit parallelism). Performance of the algorithms are ...
متن کاملTall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures
To exploit the potential of multicore architectures, recent dense linear algebra libraries have used tile algorithms, which consist in scheduling a Directed Acyclic Graph (DAG) of tasks of fine granularity where nodes represent tasks, either panel factorization or update of a block-column, and edges represent dependencies among them. Although past approaches already achieve high performance on ...
متن کاملEnhancing Parallelism of Tile QR Factorization for Multicore Architectures
To exploit the potential of multicore architectures, recent dense linear algebra libraries have used tile algorithms, which consist of scheduling a Directed Acyclic Graph (DAG) of fine granularity tasks where nodes represent tasks, either panel factorization or update of a block-column, and edges represent dependencies among them. Although past approaches already achieve high performance on mod...
متن کاملHigh-performance and Parallel Inversion of a Symmetric Positive Definite Matrix
We present families of algorithms for operations related to the computation of the inverse of a Symmetric Positive Definite (SPD) matrix: Cholesky factorization, inversion of a triangular matrix, multiplication of a triangular matrix by its transpose, and one-sweep inversion of an SPD matrix. These algorithms are systematically derived and implemented via the Formal Linear Algebra Methodology E...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1992